Large scale K-means clustering using GPUs

نویسندگان

چکیده

Abstract The k -means algorithm is widely used for clustering, compressing, and summarizing vector data. We present a fast memory-efficient GPU-based exact -means, Asynchronous Selective Batched K (ASB -means). Unlike most algorithms that require loading the whole dataset onto GPU amount of memory required to run our can be chosen much smaller than size dataset. Thus, cluster datasets whose exceeds available memory. works in batched fashion applies triangle inequality each iteration omit data point if its membership assignment, i.e., it belongs to, remains unchanged, thus significantly reducing number points need transferred between CPU’s RAM GPU’s global enabling very efficiently process large datasets. Our substantially faster implementation standard even situations when application feasible because fits into Experiments show ASB up 15x times also outperforms NVIDIA’s open-source RAPIDS machine learning library on all experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Large Scale Clustering Scheme for Kernel K-Means

Kernel functions can be viewed as a non-linear transformation that increases the separability of the input data by mapping them to a new high dimensional space. The incorporation of kernel function enables the K-Means algorithm to explore the inherent data pattern in the new space. However, the recent applications of kernel KMeans algorithm are confined to small corpora due to its expensive com...

متن کامل

Distributed Kernel K-Means for Large Scale Clustering

Clustering samples according to an effective metric and/or vector space representation is a challenging unsupervised learning task with a wide spectrum of applications. Among several clustering algorithms, k-means and its kernelized version have still a wide audience because of their conceptual simplicity and efficacy. However, the systematic application of the kernelized version of k-means is ...

متن کامل

k-means for fast and accurate large scale clustering

We propose k-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k-means builds upon the standard k-means (Lloyd’s algorithm) and combines a new strategy to accelerate the convergence with a new low time complexity divisive initialization. The accelerated convergence is achieved through only looking at kn nearest clusters and ...

متن کامل

Genetic Weighted K-means for Large-Scale Clustering Problems

This paper proposes a genetic weighted K-means algorithm called GWKMA, which is a hybridization of a genetic algorithm (GA) and a weighted K-means algorithm (WKMA). GWKMA encodes each individual by a partitioning table which uniquely determines a clustering, and employs three genetic operators (selection, crossover, mutation) and a WKMA operator. The superiority of the GWKMA over the WKMA and o...

متن کامل

A Parallel Implementation of K-Means Clustering on GPUs

Graphics Processing Units (GPU) have recently been the subject of attention in research as an efficient coprocessor for implementing many classes of highly parallel applications. The GPUs design is engineered for graphics applications, where many independent SIMD workloads are simultaneously dispatched to processing elements. While parallelism has been explored in the context of traditional CPU...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data Mining and Knowledge Discovery

سال: 2022

ISSN: ['1573-756X', '1384-5810']

DOI: https://doi.org/10.1007/s10618-022-00869-6